One Needs to Be Careful When Dismissing Outliers: A Realistic Example
نویسندگان
چکیده
Traditional approach to eliminating outliers is that we compute the sample mean μ and the sample standard deviation σ, and then, for an appropriate value k0 = 2, 3, 6, etc., we eliminate all data points outside the interval [μ−k0·σ, μ+k0·σ] as outliers. Then, we repeat this procedure with the remaining data, eliminate new outliers, etc., until on some iteration, no new outliers are eliminated. In many applications, this procedure works well. However, in this paper, we provide a realistic example in which this procedure, instead of eliminating all outliers and leaving adequate data points intact, eliminates all the data points. This example shows that one needs to be careful when applying the standard outlier-eliminating procedure. 1 Formulation of the Problem Need to eliminate outliers. In the traditional approach to data analysis, based on the sample, we estimate the means of the corresponding quantities, we estimate the variances, covariance, and correlations; see, e.g., [3]. This usually works well, but sometimes, we have outliers, i.e., values caused, e.g., by the malfunctioning of the measuring instrument. Outliers ruin the estimations. For example, if we are interested in the average temperature, and in addition to 100 measurement results around 20◦ C, we have a (clearly erroneous) value 1000◦ C, then the sample average x becomes x ≈ 20 + . . .+ 20 (100 times) + 1000 101 ≈ 30.
منابع مشابه
Simultaneous robust estimation of multi-response surfaces in the presence of outliers
A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملBAYES PREDICTION INTERVALS FOR THE BURR TYPE XI1 DISTRIBUTION IN THE PRESENCE OF OUTLIERS
Using a sample fiom Burr type XU distribution, Bayes prediction intervals are derived for the maximum and minimum of a future sample fromthe same distribution, but in the presence of a single outlier of the type 8,8. The prior of Q is assumed to be the gamma conjugate. A real example is given to illustrate the procedure. Also, the comparison between the values of the prediction bounds for dif...
متن کاملWho Should be Interviewed? A Response from Cluster Analysis
Objective: This article presents an application of cluster analysis for social sciences researches especially those studies that have an interview as part of their data collection. This application is more suitable for sequential mixed method researchers who use quantitative data to frame subsequent qualitative subsamples for conducting interviews. Methods: In more detail, the algorithm (i....
متن کاملA New Approximation for the Null Distribution of the Likelihood Ratio Test Statistics for k Outliers in a Normal Sample
Usually when performing a statistical test or estimation procedure, we assume the data are all observations of i.i.d. random variables, often from a normal distribution. Sometimes, however, we notice in a sample one or more observations that stand out from the crowd. These observation(s) are commonly called outlier(s). Outlier tests are more formal procedures which have been developed for detec...
متن کامل